1 00:00:07 --> 00:00:09 Good morning, everyone. 2 00:00:09 --> 00:00:14 Glad you are all here bright and early. 3 00:00:14 --> 00:00:20 I'm counting the days till the TA's outnumber the students. 4 00:00:20 --> 00:00:26 They'll show up. We return to a familiar story. 5 00:00:26 --> 00:00:32 This is part two, the Empire Strikes Back. 6 00:00:32 --> 00:00:33 So last time, our adversary, 7 00:00:33 --> 00:00:36 the graph, came to us with a problem. 8 00:00:36 --> 00:00:39 We have a source, and we had a directed graph, 9 00:00:39 --> 00:00:43 and we had weights on the edges, and they were all 10 00:00:43 --> 00:00:46 nonnegative. And there was happiness. 11 00:00:46 --> 00:00:50 And we triumphed over the Empire by designing Dijkstra's 12 00:00:50 --> 00:00:54 algorithm, and very efficiently finding single source shortest 13 00:00:54 --> 00:01:00 paths, shortest path weight from s to every other vertex. 14 00:01:00 --> 00:01:02 Today, however, the Death Star has a new trick 15 00:01:02 --> 00:01:05 up its sleeve, and we have negative weights, 16 00:01:05 --> 00:01:07 potentially. And we're going to have to 17 00:01:07 --> 00:01:09 somehow deal with, in particular, 18 00:01:09 --> 00:01:13 negative weight cycles. And we saw that when we have a 19 00:01:13 --> 00:01:16 negative weight cycle, we can just keep going around, 20 00:01:16 --> 00:01:19 and around, and around, and go back in time farther, 21 00:01:19 --> 00:01:21 and farther, and farther. 22 00:01:21 --> 00:01:24 And we can get to be arbitrarily far back in the 23 00:01:24 --> 00:01:26 past. And so there's no shortest 24 00:01:26 --> 00:01:29 path, because whatever path you take you can get a shorter one. 25 00:01:29 --> 00:01:33 So we want to address that issue today, and we're going to 26 00:01:33 --> 00:01:37 come up with a new algorithm actually simpler than Dijkstra, 27 00:01:37 --> 00:01:39 but not as fast, called the Bellman-Ford 28 00:01:39 --> 00:01:44 algorithm. And, it's going to allow 29 00:01:44 --> 00:01:48 negative weights, and in some sense allow 30 00:01:48 --> 00:01:54 negative weight cycles, although maybe not as much as 31 00:01:54 --> 00:01:59 you might hope. We have to leave room for a 32 00:01:59 --> 00:02:04 sequel, of course. OK, so the Bellman-Ford 33 00:02:04 --> 00:02:09 algorithm, invented by two guys, as you might expect, 34 00:02:09 --> 00:02:13 it computes the shortest path weights. 35 00:02:13 --> 00:02:17 So, it makes no assumption about the weights. 36 00:02:17 --> 00:02:22 Weights are arbitrary, and it's going to compute the 37 00:02:22 --> 00:02:27 shortest path weights. So, remember this notation: 38 00:02:27 --> 00:02:33 delta of s, v is the weight of the shortest path from s to v. 39 00:02:33 --> 00:02:40 s was called a source vertex. And, we want to compute these 40 00:02:40 --> 00:02:43 weights for all vertices, little v. 41 00:02:43 --> 00:02:47 The claim is that computing from s to everywhere is no 42 00:02:47 --> 00:02:51 harder than computing s to a particular location. 43 00:02:51 --> 00:02:53 So, we're going to do for all them. 44 00:02:53 --> 00:02:56 It's still going to be the case here. 45 00:02:56 --> 00:02:59 And, it allows negative weights. 46 00:02:59 --> 00:03:03 And this is the good case, but there's an alternative, 47 00:03:03 --> 00:03:07 which is that Bellman-Ford may just say, oops, 48 00:03:07 --> 00:03:11 there's a negative weight cycle. 49 00:03:11 --> 00:03:14 And in that case it will just say so. 50 00:03:14 --> 00:03:18 So, they say a negative weight cycle exists. 51 00:03:18 --> 00:03:23 Therefore, some of these deltas are minus infinity. 52 00:03:23 --> 00:03:27 And that seems weird. So, Bellman-Ford as we'll 53 00:03:27 --> 00:03:33 present it today is intended for the case, but there are no 54 00:03:33 --> 00:03:39 negative weights cycles, which is more intuitive. 55 00:03:39 --> 00:03:42 It sort of allows them, but it will just report them. 56 00:03:42 --> 00:03:45 In that case, it will not give you delta 57 00:03:45 --> 00:03:48 values. You can change the algorithm to 58 00:03:48 --> 00:03:52 give you delta values in that case, but we are not going to 59 00:03:52 --> 00:03:54 see it here. So, in exercise, 60 00:03:54 --> 00:03:57 after you see the algorithm, exercise is: 61 00:03:57 --> 00:04:01 compute these deltas in all cases. 62 00:04:01 --> 00:04:12 63 00:04:12 --> 00:04:19 So, it's not hard to do. But we don't have time for it 64 00:04:19 --> 00:04:24 here. So, here's the algorithm. 65 00:04:24 --> 00:04:32 It's pretty straightforward. As I said, it's easier than 66 00:04:32 --> 00:04:36 Dijkstra. It's a relaxation algorithm. 67 00:04:36 --> 00:04:40 So the main thing that it does is relax edges just like 68 00:04:40 --> 00:04:43 Dijkstra. So, we'll be able to use a lot 69 00:04:43 --> 00:04:47 of dilemmas from Dijkstra. And proof of correctness will 70 00:04:47 --> 00:04:51 be three times shorter because the first two thirds we already 71 00:04:51 --> 00:04:55 have from Dijkstra. But I'm jumping ahead a bit. 72 00:04:55 --> 00:04:57 So, the first part is initialization. 73 00:04:57 --> 00:05:01 Again, d of v will represent the estimated distance from s to 74 00:05:01 --> 00:05:05 v. And we're going to be updating 75 00:05:05 --> 00:05:08 those estimates as the algorithm goes along. 76 00:05:08 --> 00:05:10 And initially, d of s is zero, 77 00:05:10 --> 00:05:14 which now may not be the right answer conceivably. 78 00:05:14 --> 00:05:17 Everyone else is infinity, which is certainly an upper 79 00:05:17 --> 00:05:20 bound. OK, these are both upper bounds 80 00:05:20 --> 00:05:23 on the true distance. So that's fine. 81 00:05:23 --> 00:05:27 That's initialization just like before. 82 00:05:27 --> 00:05:36 83 00:05:36 --> 00:05:39 And now we have a main loop which happens v minus one times. 84 00:05:39 --> 00:05:41 We're not actually going to use the index i. 85 00:05:41 --> 00:05:43 It's just a counter. 86 00:05:43 --> 00:06:02 87 00:06:02 --> 00:06:07 And we're just going to look at every edge and relax it. 88 00:06:07 --> 00:06:13 It's a very simple idea. If you learn about relaxation, 89 00:06:13 --> 00:06:16 this is the first thing you might try. 90 00:06:16 --> 00:06:20 The question is when do you stop. 91 00:06:20 --> 00:06:25 It's sort of like I have this friend to what he was like six 92 00:06:25 --> 00:06:31 years old he would claim, oh, I know how to spell banana. 93 00:06:31 --> 00:06:37 I just don't know when to stop. OK, same thing with relaxation. 94 00:06:37 --> 00:06:40 This is our relaxation step just as before. 95 00:06:40 --> 00:06:43 We look at the edge; we see whether it violates the 96 00:06:43 --> 00:06:47 triangle inequality according to our current estimates we know 97 00:06:47 --> 00:06:51 the distance from s to v should be at most distance from s to 98 00:06:51 --> 00:06:54 plus the weight of that edge from u to v. 99 00:06:54 --> 00:06:55 If it isn't, we set it equal. 100 00:06:55 --> 00:07:00 We've proved that this is always an OK thing to do. 101 00:07:00 --> 00:07:03 We never violate, I mean, these d of v's never 102 00:07:03 --> 00:07:07 get too small if we do a bunch of relaxations. 103 00:07:07 --> 00:07:09 So, the idea is you take every edge. 104 00:07:09 --> 00:07:12 You relax it. I don't care which order. 105 00:07:12 --> 00:07:15 Just relax every edge, one each. 106 00:07:15 --> 00:07:17 And that do that V minus one times. 107 00:07:17 --> 00:07:21 The claim is that that should be enough if you have no 108 00:07:21 --> 00:07:25 negative weights cycles. So, if there's a negative 109 00:07:25 --> 00:07:30 weight cycle, we need to figure it out. 110 00:07:30 --> 00:07:35 And, we'll do that in a fairly straightforward way, 111 00:07:35 --> 00:07:40 which is we're going to do exactly the same thing. 112 00:07:40 --> 00:07:44 So this is outside before loop here. 113 00:07:44 --> 00:07:50 We'll have the same four loops for each edge in our graph. 114 00:07:50 --> 00:07:54 We'll try to relax it. And if you can relax it, 115 00:07:54 --> 00:08:02 the claim is that there has to be a negative weight cycle. 116 00:08:02 --> 00:08:04 So this is the main thing that needs proof. 117 00:08:04 --> 00:08:28 118 00:08:28 --> 00:08:31 OK, and that's the algorithm. So the claim is that at the 119 00:08:31 --> 00:08:35 ends we should have d of v, let's see, L's so to speak. 120 00:08:35 --> 00:08:38 d of v equals delta of s comma v for every vertex, 121 00:08:38 --> 00:08:40 v. If we don't find a negative 122 00:08:40 --> 00:08:44 weight cycle according to this rule, that we should have all 123 00:08:44 --> 00:08:47 the shortest path weights. That's the claim. 124 00:08:47 --> 00:08:50 Now, the first question is, in here, the running time is 125 00:08:50 --> 00:08:54 very easy to analyze. So let's start with the running 126 00:08:54 --> 00:08:56 time. We can compare it to Dijkstra, 127 00:08:56 --> 00:09:02 which is over here. What is the running time of 128 00:09:02 --> 00:09:06 this algorithm? V times E, exactly. 129 00:09:06 --> 00:09:12 OK, I'm going to assume, because it's pretty reasonable, 130 00:09:12 --> 00:09:19 that V and E are both positive. Then it's V times E. 131 00:09:19 --> 00:09:25 So, this is a little bit slower, or a fair amount slower, 132 00:09:25 --> 00:09:30 than Dijkstra's algorithm. There it is: 133 00:09:30 --> 00:09:35 E plus V log V is essentially, ignoring the logs is pretty 134 00:09:35 --> 00:09:39 much linear time. Here we have something that's 135 00:09:39 --> 00:09:43 at least quadratic in V, assuming your graph is 136 00:09:43 --> 00:09:45 connected. So, it's slower, 137 00:09:45 --> 00:09:48 but it's going to handle these negative weights. 138 00:09:48 --> 00:09:52 Dijkstra can't handle negative weights at all. 139 00:09:52 --> 00:09:56 So, let's do an example, make it clear why you might 140 00:09:56 --> 00:10:03 hope this algorithm works. And then we'll prove that it 141 00:10:03 --> 00:10:08 works, of course. But the proof will be pretty 142 00:10:08 --> 00:10:12 easy. So, I'm going to draw a graph 143 00:10:12 --> 00:10:18 that has negative weights, but no negative weight cycles 144 00:10:18 --> 00:10:24 so that I get an interesting answer. 145 00:10:24 --> 00:10:55 146 00:10:55 --> 00:10:57 Good. The other thing I need in order 147 00:10:57 --> 00:11:00 to make the output of this algorithm well defined, 148 00:11:00 --> 00:11:03 it depends in which order you visit the edges. 149 00:11:03 --> 00:11:07 So I'm going to assign an arbitrary order to these edges. 150 00:11:07 --> 00:11:11 I could just ask you for an order, but to be consistent with 151 00:11:11 --> 00:11:13 the notes, I'll put an ordering on it. 152 00:11:13 --> 00:11:17 Let's say I put number four, say that's the fourth edge I'll 153 00:11:17 --> 00:11:18 visit. It doesn't matter. 154 00:11:18 --> 00:11:22 But it will affect what happens during the algorithm for a 155 00:11:22 --> 00:11:25 particular graph. 156 00:11:25 --> 00:11:43 157 00:11:43 --> 00:11:46 Do they get them all? One, two, three, 158 00:11:46 --> 00:11:48 four, five, six, seven, eight, 159 00:11:48 --> 00:11:51 OK. And my source is going to be A. 160 00:11:51 --> 00:11:54 And, that's it. So, I want to run this 161 00:11:54 --> 00:11:57 algorithm. I'm just going to initialize 162 00:11:57 --> 00:12:01 everything. So, I set the estimates for s 163 00:12:01 --> 00:12:06 to be zero, and everyone else to be infinity. 164 00:12:06 --> 00:12:10 And to give me some notion of time, over here I'm going to 165 00:12:10 --> 00:12:15 draw or write down what all of these d values are as the 166 00:12:15 --> 00:12:20 algorithm proceeds because I'm going to start crossing them out 167 00:12:20 --> 00:12:25 and rewriting them that the figure will get a little bit 168 00:12:25 --> 00:12:28 messier. But we can keep track of it 169 00:12:28 --> 00:12:31 over here. It's initially zero and 170 00:12:31 --> 00:12:34 infinities. Yeah? 171 00:12:34 --> 00:12:36 It doesn't matter. So, for the algorithm you can 172 00:12:36 --> 00:12:40 go to the edges in a different order every time if you want. 173 00:12:40 --> 00:12:42 We'll prove that, but here I'm going to go 174 00:12:42 --> 00:12:44 through the same order every time. 175 00:12:44 --> 00:12:47 Good question. It turns out it doesn't matter 176 00:12:47 --> 00:12:49 here. OK, so here's the starting 177 00:12:49 --> 00:12:51 point. Now I'm going to relax every 178 00:12:51 --> 00:12:53 edge. So, there's going to be a lot 179 00:12:53 --> 00:12:55 of edges here that don't do anything. 180 00:12:55 --> 00:12:57 I try to relax n minus one. I'd say, well, 181 00:12:57 --> 00:13:02 I know how to get from s to B with weight infinity. 182 00:13:02 --> 00:13:04 Infinity plus two I can get to from s to E. 183 00:13:04 --> 00:13:08 Well, infinity plus two is not much better than infinity. 184 00:13:08 --> 00:13:11 OK, so I don't do anything, don't update this to infinity. 185 00:13:11 --> 00:13:14 I mean, infinity plus two sounds even worse. 186 00:13:14 --> 00:13:16 But infinity plus two is infinity. 187 00:13:16 --> 00:13:20 OK, that's the edge number one. So, no relaxation edge number 188 00:13:20 --> 00:13:24 two, same deal as number three, same deal, edge number four we 189 00:13:24 --> 00:13:27 start to get something interesting because I have a 190 00:13:27 --> 00:13:31 finite value here that says I can get from A to B using a 191 00:13:31 --> 00:13:35 total weight of minus one. So that seems good. 192 00:13:35 --> 00:13:41 I'll write down minus one here, and update B to minus one. 193 00:13:41 --> 00:13:45 The rest stay the same. So, I'm just going to keep 194 00:13:45 --> 00:13:50 doing this over and over. That was edge number four. 195 00:13:50 --> 00:13:53 Number five, we also get a relaxation. 196 00:13:53 --> 00:14:00 Four is better than infinity. So, c gets a number of four. 197 00:14:00 --> 00:14:04 Then we get to edge number six. That's infinity plus five is 198 00:14:04 --> 00:14:07 worse than four. OK, so no relaxation there. 199 00:14:07 --> 00:14:11 Edge number seven is interesting because I have a 200 00:14:11 --> 00:14:15 finite value here minus one plus the weight of this edge, 201 00:14:15 --> 00:14:18 which is three. That's a total of two, 202 00:14:18 --> 00:14:20 which is actually better than four. 203 00:14:20 --> 00:14:24 So, this route, A, B, c is actually better than 204 00:14:24 --> 00:14:26 the route I just found a second ago. 205 00:14:26 --> 00:14:30 So, this is now a two. This is all happening in one 206 00:14:30 --> 00:14:35 iteration of the main loop. We actually found two good 207 00:14:35 --> 00:14:38 paths to c. We found one better than the 208 00:14:38 --> 00:14:41 other. OK, and that was edge number 209 00:14:41 --> 00:14:44 seven, and edge number eight is over here. 210 00:14:44 --> 00:14:47 It doesn't matter. OK, so that was round one of 211 00:14:47 --> 00:14:50 this outer loop, so, the first value of i. 212 00:14:50 --> 00:14:52 i equals one. OK, now we continue. 213 00:14:52 --> 00:14:56 Just keep going. So, we start with edge number 214 00:14:56 --> 00:15:00 one. Now, minus one plus two is one. 215 00:15:00 --> 00:15:04 That's better than infinity. It'll start speeding up. 216 00:15:04 --> 00:15:08 It's repetitive. It's actually not too much 217 00:15:08 --> 00:15:14 longer until we're done. Number two, this is an infinity 218 00:15:14 --> 00:15:17 so we don't do anything. Number three: 219 00:15:17 --> 00:15:22 minus one plus two is one; better than infinity. 220 00:15:22 --> 00:15:25 This is vertex d, and it's number three. 221 00:15:25 --> 00:15:31 Number four we've already done. Nothing changed. 222 00:15:31 --> 00:15:35 Number five: this is where we see the path 223 00:15:35 --> 00:15:38 four again, but that's worse than two. 224 00:15:38 --> 00:15:43 So, we don't update anything. Number six: one plus five is 225 00:15:43 --> 00:15:47 six, which is bigger than two, so no good. 226 00:15:47 --> 00:15:49 Go around this way. Number seven: 227 00:15:49 --> 00:15:53 same deal. Number eight is interesting. 228 00:15:53 --> 00:15:58 So, we have a weight of one here, a weight of minus three 229 00:15:58 --> 00:16:02 here. So, the total is minus two, 230 00:16:02 --> 00:16:07 which is better than one. So, that was d. 231 00:16:07 --> 00:16:13 And, I believe that's it. So that was definitely the end 232 00:16:13 --> 00:16:18 of that round. So, it's I plus two because we 233 00:16:18 --> 00:16:24 just looked at the eighth edge. And, I'll cheat and check. 234 00:16:24 --> 00:16:30 Indeed, that is the last thing that happens. 235 00:16:30 --> 00:16:33 We can check the couple of outgoing edges from d because 236 00:16:33 --> 00:16:36 that's the only one whose value just changed. 237 00:16:36 --> 00:16:39 And, there are no more relaxations possible. 238 00:16:39 --> 00:16:43 So, that was in two rounds. The claim is we got all the 239 00:16:43 --> 00:16:47 shortest path weights. The algorithm would actually 240 00:16:47 --> 00:16:51 loop four times to guarantee correctness because we have five 241 00:16:51 --> 00:16:53 vertices here and one less than that. 242 00:16:53 --> 00:16:56 So, in fact, in the execution here there are 243 00:16:56 --> 00:16:59 two more blank rounds at the bottom. 244 00:16:59 --> 00:17:03 Nothing happens. But, what the hell? 245 00:17:03 --> 00:17:06 OK, so that is Bellman-Ford. I mean, it's certainly not 246 00:17:06 --> 00:17:08 doing anything wrong. The question is, 247 00:17:08 --> 00:17:11 why is it guaranteed to converge in V minus one steps 248 00:17:11 --> 00:17:13 unless there is a negative weight cycle? 249 00:17:13 --> 00:17:15 Question? 250 00:17:15 --> 00:17:24 251 00:17:24 --> 00:17:25 Right, so that's an optimization. 252 00:17:25 --> 00:17:28 If you discover a whole round, and nothing happens, 253 00:17:28 --> 00:17:31 so you can keep track of that in the algorithm thing, 254 00:17:31 --> 00:17:33 you can stop. In the worst case, 255 00:17:33 --> 00:17:35 it won't make a difference. But in practice, 256 00:17:35 --> 00:17:37 you probably want to do that. Yeah? 257 00:17:37 --> 00:17:40 Good question. All right, so some simple 258 00:17:40 --> 00:17:42 observations, I mean, we're only doing 259 00:17:42 --> 00:17:44 relaxation. So, we can use a lot of our 260 00:17:44 --> 00:17:46 analysis from before. In particular, 261 00:17:46 --> 00:17:49 the d values are only decreasing monotonically. 262 00:17:49 --> 00:17:51 As we cross out values here, we are always making it 263 00:17:51 --> 00:17:54 smaller, which is good. Another nifty thing about this 264 00:17:54 --> 00:18:00 algorithm is that you can run it even in a distributed system. 265 00:18:00 --> 00:18:02 If this is some actual network, some computer network, 266 00:18:02 --> 00:18:05 and these are machines, and they're communicating by 267 00:18:05 --> 00:18:07 these links, I mean, it's a purely local thing. 268 00:18:07 --> 00:18:09 Relaxation is a local thing. You don't need any global 269 00:18:09 --> 00:18:12 strategy, and you're asking about, can we do a different 270 00:18:12 --> 00:18:15 order in each step? Well, yeah, you could just keep 271 00:18:15 --> 00:18:16 relaxing edges, and keep relaxing edges, 272 00:18:16 --> 00:18:19 and just keep going for the entire lifetime of the network. 273 00:18:19 --> 00:18:21 And eventually, you will find shortest paths. 274 00:18:21 --> 00:18:24 So, this algorithm is guaranteed to finish in V rounds 275 00:18:24 --> 00:18:27 in a distributed system. It might be more asynchronous. 276 00:18:27 --> 00:18:30 And, it's a little harder to analyze. 277 00:18:30 --> 00:18:34 But it will still work eventually. 278 00:18:34 --> 00:18:41 It's guaranteed to converge. And so, Bellman-Ford is used in 279 00:18:41 --> 00:18:46 the Internet for finding shortest paths. 280 00:18:46 --> 00:18:51 OK, so let's finally prove that it works. 281 00:18:51 --> 00:18:56 This should only take a couple of boards. 282 00:18:56 --> 00:19:03 So let's suppose we have a graph and some edge weights that 283 00:19:03 --> 00:19:13 have no negative weight cycles. Then the claim is that we 284 00:19:13 --> 00:19:19 terminate with the correct answer. 285 00:19:19 --> 00:19:29 So, Bellman-Ford terminates with all of these d of v values 286 00:19:29 --> 00:19:38 set to the delta values for every vertex. 287 00:19:38 --> 00:19:42 OK, the proof is going to be pretty immediate using the 288 00:19:42 --> 00:19:45 lemmas that we had from before if you remember them. 289 00:19:45 --> 00:19:50 So, we're just going to look at every vertex separately. 290 00:19:50 --> 00:19:54 So, I'll call the vertex v. The claim is that this holds by 291 00:19:54 --> 00:19:58 the end of the algorithm. So, remember what we need to 292 00:19:58 --> 00:20:02 prove is that at some point, d of v equals delta of s comma 293 00:20:02 --> 00:20:06 v because we know it decreases monotonically, 294 00:20:06 --> 00:20:10 and we know that it never gets any smaller than the correct 295 00:20:10 --> 00:20:15 value because relaxations are always safe. 296 00:20:15 --> 00:20:24 So, we just need to show at some point this holds, 297 00:20:24 --> 00:20:32 and that it will hold at the end. 298 00:20:32 --> 00:20:41 So, by monotonicity of the d values, and by correctness part 299 00:20:41 --> 00:20:51 one, which was that the d of v's are always greater than or equal 300 00:20:51 --> 00:20:58 to the deltas, we only need to show that at 301 00:20:58 --> 00:21:04 some point we have equality. 302 00:21:04 --> 00:21:18 303 00:21:18 --> 00:21:21 So that's our goal. So what we're going to do is 304 00:21:21 --> 00:21:24 just look at v, and the shortest path to v, 305 00:21:24 --> 00:21:30 and see what happens to the algorithm relative to that path. 306 00:21:30 --> 00:21:35 So, I'm going to name the path. Let's call it p. 307 00:21:35 --> 00:21:40 It starts at vertex v_0 and goes to v_1, v_2, 308 00:21:40 --> 00:21:46 whatever, and ends at v_k. And, this is not just any 309 00:21:46 --> 00:21:51 shortest path, but it's one that starts at s. 310 00:21:51 --> 00:21:54 So, v_0's s, and it ends at v. 311 00:21:54 --> 00:22:01 So, I'm going to give a couple of names to s and v so I can 312 00:22:01 --> 00:22:04 talk about the path more uniformly. 313 00:22:04 --> 00:22:11 So, this is a shortest path from s to v. 314 00:22:11 --> 00:22:15 Now, I also want it to be not just any shortest path from s to 315 00:22:15 --> 00:22:20 v, but among all shortest paths from s to v I want it to be one 316 00:22:20 --> 00:22:23 with the fewest possible edges. 317 00:22:23 --> 00:22:32 318 00:22:32 --> 00:22:36 OK, so shortest here means in terms of the total weight of the 319 00:22:36 --> 00:22:38 path. Subject to being shortest in 320 00:22:38 --> 00:22:42 weight, I wanted to also be shortest in the number of edges. 321 00:22:42 --> 00:22:46 And, the reason I want that is to be able to conclude that p is 322 00:22:46 --> 00:22:50 a simple path, meaning that it doesn't repeat 323 00:22:50 --> 00:22:52 any vertices. Now, can anyone tell me why I 324 00:22:52 --> 00:22:56 need to assume that the number of edges is the smallest 325 00:22:56 --> 00:23:01 possible in order to guarantee that p is simple? 326 00:23:01 --> 00:23:04 The claim is that not all shortest paths are necessarily 327 00:23:04 --> 00:23:05 simple. Yeah? 328 00:23:05 --> 00:23:07 Right, I can have a zero weight cycle, exactly. 329 00:23:07 --> 00:23:10 So, we are hoping, I mean, in fact in the theorem 330 00:23:10 --> 00:23:14 here, we're assuming that there are no negative weight cycles. 331 00:23:14 --> 00:23:17 But there might be zero weight cycles still. 332 00:23:17 --> 00:23:20 As a zero weight cycle, you can put that in the middle 333 00:23:20 --> 00:23:23 of any shortest path to make it arbitrarily long, 334 00:23:23 --> 00:23:26 repeat vertices over and over. That's going to be annoying. 335 00:23:26 --> 00:23:30 What I want is that p is simple. 336 00:23:30 --> 00:23:33 And, I can guarantee that essentially by shortcutting. 337 00:23:33 --> 00:23:36 If ever I take a zero weight cycle, I throw it away. 338 00:23:36 --> 00:23:39 And this is one mathematical way of doing that. 339 00:23:39 --> 00:23:43 OK, now what else do we know about this shortest path? 340 00:23:43 --> 00:23:47 Well, we know that subpaths are shortest paths are shortest 341 00:23:47 --> 00:23:49 paths. That's optimal substructure. 342 00:23:49 --> 00:23:53 So, we know what the shortest path from s to v_i is sort of 343 00:23:53 --> 00:23:55 inductively. It's the shortest path, 344 00:23:55 --> 00:23:58 I mean, it's the weight of that path, which is, 345 00:23:58 --> 00:24:01 in particular, the shortest path from s to v 346 00:24:01 --> 00:24:07 minus one plus the weight of the last edge, v minus one to v_i. 347 00:24:07 --> 00:24:17 So, this is by optimal substructure as we proved last 348 00:24:17 --> 00:24:23 time. OK, and I think that's pretty 349 00:24:23 --> 00:24:30 much the warm-up. So, I want to sort of do this 350 00:24:30 --> 00:24:33 inductively in I, start out with v zero, 351 00:24:33 --> 00:24:37 and go up to v_k. So, the first question is, 352 00:24:37 --> 00:24:40 what is d of v_0, which is s? 353 00:24:40 --> 00:24:44 What is d of the source? Well, certainly at the 354 00:24:44 --> 00:24:47 beginning of the algorithm, it's zero. 355 00:24:47 --> 00:24:52 So, let's say equals zero initially because that's what we 356 00:24:52 --> 00:24:55 set it to. And it only goes down from 357 00:24:55 --> 00:24:57 there. So, it certainly, 358 00:24:57 --> 00:25:01 at most, zero. The real question is, 359 00:25:01 --> 00:25:06 what is delta of s comma v_0. What is the shortest path 360 00:25:06 --> 00:25:09 weight from s to s? It has to be zero, 361 00:25:09 --> 00:25:13 otherwise you have a negative weight cycle, 362 00:25:13 --> 00:25:15 exactly. My favorite answer, 363 00:25:15 --> 00:25:19 zero. So, if we had another path from 364 00:25:19 --> 00:25:21 s to s, I mean, that is a cycle. 365 00:25:21 --> 00:25:26 So, it's got to be zero. So, these are actually equal at 366 00:25:26 --> 00:25:32 the beginning of the algorithm, which is great. 367 00:25:32 --> 00:25:37 That means they will be for all time because we just argued up 368 00:25:37 --> 00:25:41 here, only goes down, never can get too small. 369 00:25:41 --> 00:25:45 So, we have d of v_0 set to the right thing. 370 00:25:45 --> 00:25:49 Great: good for the base case of the induction. 371 00:25:49 --> 00:25:53 Of course, what we really care about is v_k, 372 00:25:53 --> 00:25:56 which is v. So, let's talk about the v_i 373 00:25:56 --> 00:26:02 inductively, and then we will get v_k as a result. 374 00:26:02 --> 00:26:11 375 00:26:11 --> 00:26:14 So, yeah, let's do it by induction. 376 00:26:14 --> 00:26:16 That's more fun. 377 00:26:16 --> 00:26:27 378 00:26:27 --> 00:26:32 Let's say that d of v_i is equal to delta of s v_i after I 379 00:26:32 --> 00:26:38 rounds of the algorithm. So, this is actually referring 380 00:26:38 --> 00:26:42 to the I that is in the algorithm here. 381 00:26:42 --> 00:26:46 These are rounds. So, one round is an entire 382 00:26:46 --> 00:26:52 execution of all the edges, relaxation of all the edges. 383 00:26:52 --> 00:26:56 So, this is certainly true for I equals zero. 384 00:26:56 --> 00:27:00 We just proved that. After zero rounds, 385 00:27:00 --> 00:27:06 at the beginning of the algorithm, d of v_0 equals delta 386 00:27:06 --> 00:27:11 of s, v_0. OK, so now, that's not really 387 00:27:11 --> 00:27:13 what I wanted, but OK, fine. 388 00:27:13 --> 00:27:16 Now we'll prove it for d of v_i plus one. 389 00:27:16 --> 00:27:20 Generally, I recommend you assume something. 390 00:27:20 --> 00:27:24 In fact, why don't I follow my own advice and change it? 391 00:27:24 --> 00:27:29 It's usually nicer to think of induction as recursion. 392 00:27:29 --> 00:27:32 So, you assume that this is true, let's say, 393 00:27:32 --> 00:27:37 for j less than the i that you care about, and then you prove 394 00:27:37 --> 00:27:42 it for d of v_i. It's usually a lot easier to 395 00:27:42 --> 00:27:44 think about it that way. In particular, 396 00:27:44 --> 00:27:48 you can use strong induction for all less than i. 397 00:27:48 --> 00:27:51 Here, we're only going to need it for one less. 398 00:27:51 --> 00:27:56 We have some relation between I and I minus one here in terms of 399 00:27:56 --> 00:27:59 the deltas. And so, we want to argue 400 00:27:59 --> 00:28:05 something about the d values. OK, well, let's think about 401 00:28:05 --> 00:28:08 what's going on here. We know that, 402 00:28:08 --> 00:28:15 let's say, after I minus one rounds, we have this inductive 403 00:28:15 --> 00:28:22 hypothesis, d of v_i minus one equals delta of s v_i minus one. 404 00:28:22 --> 00:28:27 And, we want to conclude that after i rounds, 405 00:28:27 --> 00:28:31 so we have one more round to do this. 406 00:28:31 --> 00:28:38 We want to conclude that d of v_i has the right answer, 407 00:28:38 --> 00:28:44 delta of s comma v_i. Does that look familiar at all? 408 00:28:44 --> 00:28:47 So we want to relax every edge in this round. 409 00:28:47 --> 00:28:49 In particular, at some point, 410 00:28:49 --> 00:28:53 we have to relax the edge from v_i minus one to v_i. 411 00:28:53 --> 00:28:56 We know that this path consists of edges. 412 00:28:56 --> 00:29:00 That's the definition of a path. 413 00:29:00 --> 00:29:10 So, during the i'th round, we relax every edge. 414 00:29:10 --> 00:29:18 So, we better relax v_i minus one v_i. 415 00:29:18 --> 00:29:30 And, what happens then? It's a test of memory. 416 00:29:30 --> 00:29:43 417 00:29:43 --> 00:29:46 Quick, the Death Star is approaching. 418 00:29:46 --> 00:29:51 So, if we have the correct value for v_i minus one, 419 00:29:51 --> 00:29:57 that we relax an outgoing edge from there, and that edge is an 420 00:29:57 --> 00:30:01 edge of the shortest path from s to v_i. 421 00:30:01 --> 00:30:07 What do we know? d of v_i becomes the correct 422 00:30:07 --> 00:30:13 value, delta of s comma v_i. This was called correctness 423 00:30:13 --> 00:30:18 lemma last time. One of the things we proved 424 00:30:18 --> 00:30:24 about Dijkstra's algorithm, but it was really just a fact 425 00:30:24 --> 00:30:29 about relaxation. And it was a pretty simple 426 00:30:29 --> 00:30:32 proof. And it comes from this fact. 427 00:30:32 --> 00:30:35 We know the shortest path weight is this. 428 00:30:35 --> 00:30:38 So, certainly d of v_i was at least this big, 429 00:30:38 --> 00:30:42 and let's suppose it's greater, or otherwise we were done. 430 00:30:42 --> 00:30:44 We know d of v_i minus one is set to this. 431 00:30:44 --> 00:30:48 And so, this is exactly the condition that's being checked 432 00:30:48 --> 00:30:52 in the relaxation step. And, the d of v_i value will be 433 00:30:52 --> 00:30:54 greater than this, let's suppose. 434 00:30:54 --> 00:30:56 And then, we'll set it equal to this. 435 00:30:56 --> 00:31:01 And that's exactly d of s v_i. So, when we relax that edge, 436 00:31:01 --> 00:31:04 we've got to set it to the right value. 437 00:31:04 --> 00:31:06 So, this is the end of the proof, right? 438 00:31:06 --> 00:31:08 It's very simple. The point is, 439 00:31:08 --> 00:31:11 you look at your shortest path. Here it is. 440 00:31:11 --> 00:31:14 And if we assume there's no negative weight cycles, 441 00:31:14 --> 00:31:17 this has the correct value initially. 442 00:31:17 --> 00:31:20 d of s is going to be zero. After the first round, 443 00:31:20 --> 00:31:23 you've got to relax this edge. And then you get the right 444 00:31:23 --> 00:31:26 value for that vertex. After the second round, 445 00:31:26 --> 00:31:30 you've got to relax this edge, which gets you the right d 446 00:31:30 --> 00:31:36 value for this vertex and so on. And so, no matter which 447 00:31:36 --> 00:31:40 shortest path you take, you can apply this analysis. 448 00:31:40 --> 00:31:44 And you know that by, if the length of this path, 449 00:31:44 --> 00:31:50 here we assumed it was k edges, then after k rounds you've got 450 00:31:50 --> 00:31:53 to be done. OK, so this was not actually 451 00:31:53 --> 00:31:57 the end of the proof. Sorry. 452 00:31:57 --> 00:32:03 So this means after k rounds, we have the right answer for 453 00:32:03 --> 00:32:08 v_k, which is v. So, the only question is how 454 00:32:08 --> 00:32:12 big could k be? And, it better be the right 455 00:32:12 --> 00:32:18 answer, at most, v minus one is the claim by the 456 00:32:18 --> 00:32:24 algorithm that you only need to do v minus one steps. 457 00:32:24 --> 00:32:30 And indeed, the number of edges in a simple path in a graph is, 458 00:32:30 --> 00:32:37 at most, the number of vertices minus one. 459 00:32:37 --> 00:32:40 k is, at most, v minus one because p is 460 00:32:40 --> 00:32:43 simple. So, that's why we had to assume 461 00:32:43 --> 00:32:47 that it wasn't just any shortest path. 462 00:32:47 --> 00:32:52 It had to be a simple one so it didn't repeat any vertices. 463 00:32:52 --> 00:32:55 So there are, at most, V vertices in the 464 00:32:55 --> 00:33:01 path, so at most, V minus one edges in the path. 465 00:33:01 --> 00:33:05 OK, and that's all there is to Bellman-Ford. 466 00:33:05 --> 00:33:08 So: pretty simple in correctness. 467 00:33:08 --> 00:33:15 Of course, we're using a lot of the lemmas that we proved last 468 00:33:15 --> 00:33:21 time, which makes it easier. OK, a consequence of this 469 00:33:21 --> 00:33:27 theorem, or of this proof is that if Bellman-Ford fails to 470 00:33:27 --> 00:33:33 converge, and that's what the algorithm is checking is whether 471 00:33:33 --> 00:33:39 this relaxation still requires work after these d minus one 472 00:33:39 --> 00:33:44 steps. Right, the end of this 473 00:33:44 --> 00:33:48 algorithm is run another round, a V'th round, 474 00:33:48 --> 00:33:53 see whether anything changes. So, we'll say that the 475 00:33:53 --> 00:33:58 algorithm fails to converge after V minus one steps or 476 00:33:58 --> 00:34:01 rounds. Then, there has to be a 477 00:34:01 --> 00:34:04 negative weight cycle. OK, this is just a 478 00:34:04 --> 00:34:06 contrapositive of what we proved. 479 00:34:06 --> 00:34:10 We proved that if you assume there's no negative weight 480 00:34:10 --> 00:34:14 cycle, then we know that d of s is zero, and then all this 481 00:34:14 --> 00:34:18 argument says is you've got to converge after v minus one 482 00:34:18 --> 00:34:21 rounds. There can't be anything left to 483 00:34:21 --> 00:34:24 do once you've reached the shortest path weights because 484 00:34:24 --> 00:34:30 you're going monotonically; you can never hit the bottom. 485 00:34:30 --> 00:34:33 You can never go to the floor. So, if you fail to converge 486 00:34:33 --> 00:34:37 somehow after V minus one rounds, you've got to have 487 00:34:37 --> 00:34:40 violated the assumption. The only assumption we made was 488 00:34:40 --> 00:34:42 there's no negative weight cycle. 489 00:34:42 --> 00:34:45 So, this tells us that Bellman-Ford is actually 490 00:34:45 --> 00:34:48 correct. When it says that there is a 491 00:34:48 --> 00:34:51 negative weight cycle, it indeed means it. 492 00:34:51 --> 00:34:53 It's true. OK, and you can modify 493 00:34:53 --> 00:34:56 Bellman-Ford in that case to sort of run a little longer, 494 00:34:56 --> 00:35:01 and find where all the minus infinities are. 495 00:35:01 --> 00:35:02 And that is, in some sense, 496 00:35:02 --> 00:35:05 one of the things you have to do in your problem set, 497 00:35:05 --> 00:35:08 I believe. So, I won't cover it here. 498 00:35:08 --> 00:35:11 But, it's a good exercise in any case to figure out how you 499 00:35:11 --> 00:35:14 would find where the minus infinities are. 500 00:35:14 --> 00:35:18 What are all the vertices reachable from negative weight 501 00:35:18 --> 00:35:20 cycle? Those are the ones that have 502 00:35:20 --> 00:35:22 minus infinities. OK, so you might say, 503 00:35:22 --> 00:35:26 well, that was awfully fast. Actually, it's not over yet. 504 00:35:26 --> 00:35:29 The episode is not yet ended. We're going to use Bellman-Ford 505 00:35:29 --> 00:35:35 to solve the even bigger and greater shortest path problems. 506 00:35:35 --> 00:35:39 And in the remainder of today's lecture, we will see it applied 507 00:35:39 --> 00:35:42 to a more general problem, in some sense, 508 00:35:42 --> 00:35:45 called linear programming. And the next lecture, 509 00:35:45 --> 00:35:49 we'll really use it to do some amazing stuff with all pairs 510 00:35:49 --> 00:35:52 shortest paths. Let's go over here. 511 00:35:52 --> 00:35:55 So, our goal, although it won't be obvious 512 00:35:55 --> 00:35:59 today, is to be able to compute the shortest paths between every 513 00:35:59 --> 00:36:03 pair of vertices, which we could certainly do at 514 00:36:03 --> 00:36:08 this point just by running Bellman-Ford v times. 515 00:36:08 --> 00:36:15 OK, but we want to do better than that, of course. 516 00:36:15 --> 00:36:21 And, that will be the climax of the trilogy. 517 00:36:21 --> 00:36:30 OK, today we just discovered who Luke's father is. 518 00:36:30 --> 00:36:37 So, it turns out the father of shortest paths is linear 519 00:36:37 --> 00:36:42 programming. Actually, simultaneously the 520 00:36:42 --> 00:36:50 father and the mother because programs do not have gender. 521 00:36:50 --> 00:36:57 OK, my father likes to say, we both took improv comedy 522 00:36:57 --> 00:37:05 lessons so we have degrees in improvisation. 523 00:37:05 --> 00:37:07 And he said, you know, we went to improv 524 00:37:07 --> 00:37:10 classes in order to learn how to make our humor better. 525 00:37:10 --> 00:37:13 And, the problem is, it didn't actually make our 526 00:37:13 --> 00:37:16 humor better. It just made us less afraid to 527 00:37:16 --> 00:37:17 use it. [LAUGHTER] So, 528 00:37:17 --> 00:37:20 you are subjected to all this improv humor. 529 00:37:20 --> 00:37:22 I didn't see the connection of Luke's father, 530 00:37:22 --> 00:37:25 but there you go. OK, so, linear programming is a 531 00:37:25 --> 00:37:29 very general problem, a very big tool. 532 00:37:29 --> 00:37:32 Has anyone seen linear programming before? 533 00:37:32 --> 00:37:36 OK, one person. And, I'm sure you will, 534 00:37:36 --> 00:37:40 at some time in your life, do anything vaguely computing 535 00:37:40 --> 00:37:45 optimization related, linear programming comes up at 536 00:37:45 --> 00:37:48 some point. It's a very useful tool. 537 00:37:48 --> 00:37:53 You're given a matrix and two vectors: not too exciting yet. 538 00:37:53 --> 00:37:57 What you want to do is find a vector. 539 00:37:57 --> 00:38:02 This is a very dry description. We'll see what makes it so 540 00:38:02 --> 00:38:04 interesting in a moment. 541 00:38:04 --> 00:38:17 542 00:38:17 --> 00:38:21 So, you want to maximize some objective, and you have some 543 00:38:21 --> 00:38:24 constraints. And they're all linear. 544 00:38:24 --> 00:38:28 So, the objective is a linear function in the variables x, 545 00:38:28 --> 00:38:32 and your constraints are a bunch of linear constraints, 546 00:38:32 --> 00:38:36 inequality constraints, that's one makes an 547 00:38:36 --> 00:38:39 interesting. It's not just solving a linear 548 00:38:39 --> 00:38:43 system as you've seen in linear algebra, or whatever. 549 00:38:43 --> 00:38:46 Or, of course, it could be that there is no 550 00:38:46 --> 00:38:49 such x. OK: vaguely familiar you might 551 00:38:49 --> 00:38:52 think to the theorem about Bellman-Ford. 552 00:38:52 --> 00:38:56 And, we'll show that there's some kind of connection here 553 00:38:56 --> 00:39:01 that either you want to find something, or show that it 554 00:39:01 --> 00:39:06 doesn't exist. Well, that's still a pretty 555 00:39:06 --> 00:39:09 vague connection, but I also want to maximize 556 00:39:09 --> 00:39:13 something, or are sort of minimize the shortest paths, 557 00:39:13 --> 00:39:17 OK, somewhat similar. We have these constraints. 558 00:39:17 --> 00:39:19 So, yeah. This may be intuitive to you, 559 00:39:19 --> 00:39:22 I don't know. I prefer a more geometric 560 00:39:22 --> 00:39:27 picture, and I will try to draw such a geometric picture, 561 00:39:27 --> 00:39:30 and I've never tried to do this on a blackboard, 562 00:39:30 --> 00:39:36 so it should be interesting. I think I'm going to fail 563 00:39:36 --> 00:39:39 miserably. It sort of looks like a 564 00:39:39 --> 00:39:41 dodecahedron, right? 565 00:39:41 --> 00:39:44 Sort of, kind of, not really. 566 00:39:44 --> 00:39:47 A bit rough on the bottom, OK. 567 00:39:47 --> 00:39:51 So, if you have a bunch of linear constraints, 568 00:39:51 --> 00:39:56 this is supposed to be in 3-D. Now I labeled it. 569 00:39:56 --> 00:40:00 It's now in 3-D. Good. 570 00:40:00 --> 00:40:02 So, you have these linear constraints. 571 00:40:02 --> 00:40:06 That turns out to define hyperplanes in n dimensions. 572 00:40:06 --> 00:40:11 OK, so you have this base here that's three-dimensional space. 573 00:40:11 --> 00:40:14 So, n equals three. And, these hyperplanes, 574 00:40:14 --> 00:40:17 if you're looking at one side of the hyperplane, 575 00:40:17 --> 00:40:21 that's the less than or equal to, if you take the 576 00:40:21 --> 00:40:24 intersection, you get some convex polytope or 577 00:40:24 --> 00:40:27 polyhedron. In 3-D, you might get a 578 00:40:27 --> 00:40:29 dodecahedron or whatever. And, your goal, 579 00:40:29 --> 00:40:33 you have some objective vector c, let's say, 580 00:40:33 --> 00:40:37 up. Suppose that's the c vector. 581 00:40:37 --> 00:40:42 Your goal is to find the highest point in this polytope. 582 00:40:42 --> 00:40:47 So here, it's maybe this one. OK, this is the target. 583 00:40:47 --> 00:40:49 This is the optimal, x. 584 00:40:49 --> 00:40:54 That is the geometric view. If you prefer the algebraic 585 00:40:54 --> 00:41:00 view, you want to maximize the c transpose times x. 586 00:41:00 --> 00:41:01 So, this is m. This is n. 587 00:41:01 --> 00:41:04 Check out the dimensions work out. 588 00:41:04 --> 00:41:08 So that's saying you want to maximize the dot product. 589 00:41:08 --> 00:41:13 You want to maximize the extent to which x is in the direction 590 00:41:13 --> 00:41:16 c. And, you want to maximize that 591 00:41:16 --> 00:41:20 subject to some constraints, which looks something like 592 00:41:20 --> 00:41:22 this, maybe. So, this is A, 593 00:41:22 --> 00:41:25 and it's m by n. You want to multiply it by, 594 00:41:25 --> 00:41:30 it should be something of height n. 595 00:41:30 --> 00:41:32 That's x. Let me put x down here, 596 00:41:32 --> 00:41:36 n by one. And, it should be less than or 597 00:41:36 --> 00:41:39 equal to something of this height, which is B, 598 00:41:39 --> 00:41:44 the right hand side. OK, that's the algebraic view, 599 00:41:44 --> 00:41:48 which is to check out all the dimensions are working out. 600 00:41:48 --> 00:41:52 But, you can read these off in each row here, 601 00:41:52 --> 00:41:57 when multiplied by this column, gives you one value here. 602 00:41:57 --> 00:42:03 And as just a linear constraints on all the x sides. 603 00:42:03 --> 00:42:08 So, you want to maximize this linear function of x_1 up to x_n 604 00:42:08 --> 00:42:11 subject to these constraints, OK? 605 00:42:11 --> 00:42:16 Pretty simple, but pretty powerful in general. 606 00:42:16 --> 00:42:21 So, it turns out that with, you can formulate a huge number 607 00:42:21 --> 00:42:26 of problems such as shortest paths as a linear program. 608 00:42:26 --> 00:42:31 So, it's a general tool. And in this class, 609 00:42:31 --> 00:42:37 we will not cover any algorithms for solving linear 610 00:42:37 --> 00:42:40 programming. It's a bit tricky. 611 00:42:40 --> 00:42:44 I'll just mention that they are out there. 612 00:42:44 --> 00:42:50 So, there's many efficient algorithms, and lots of code 613 00:42:50 --> 00:42:55 that does this. It's a very practical setup. 614 00:42:55 --> 00:43:02 So, lots of algorithms to solve LP's, linear programs. 615 00:43:02 --> 00:43:05 Linear programming is usually called LP. 616 00:43:05 --> 00:43:08 And, I'll mention a few of them. 617 00:43:08 --> 00:43:14 There's the simplex algorithm. This is one of the first. 618 00:43:14 --> 00:43:18 I think it is the first, the ellipsoid algorithm. 619 00:43:18 --> 00:43:24 There's interior point methods, and there's random sampling. 620 00:43:24 --> 00:43:29 I'll just say a little bit about each of these because 621 00:43:29 --> 00:43:36 we're not going to talk about any of them in depth. 622 00:43:36 --> 00:43:38 The simplex algorithm, this is, I mean, 623 00:43:38 --> 00:43:41 one of the first algorithms in the world in some sense, 624 00:43:41 --> 00:43:43 certainly one of the most popular. 625 00:43:43 --> 00:43:47 It's still used today. Almost all linear programming 626 00:43:47 --> 00:43:50 code uses the simplex algorithm. It happens to run an 627 00:43:50 --> 00:43:53 exponential time in the worst-case, so it's actually 628 00:43:53 --> 00:43:56 pretty bad theoretically. But in practice, 629 00:43:56 --> 00:43:59 it works really well. And there is some recent work 630 00:43:59 --> 00:44:03 that tries to understand this. It's still exponential in the 631 00:44:03 --> 00:44:06 worst case. But, it's practical. 632 00:44:06 --> 00:44:10 There's actually an open problem whether there exists a 633 00:44:10 --> 00:44:13 variation of simplex that runs in polynomial time. 634 00:44:13 --> 00:44:17 But, I won't go into that. That's a major open problem in 635 00:44:17 --> 00:44:22 this area of linear programming. The ellipsoid algorithm was the 636 00:44:22 --> 00:44:26 first algorithm to solve linear programming in polynomial time. 637 00:44:26 --> 00:44:30 So, for a long time, people didn't know. 638 00:44:30 --> 00:44:32 Around this time, people started realizing 639 00:44:32 --> 00:44:36 polynomial time is a good thing. That happened around the late 640 00:44:36 --> 00:44:37 60s. Polynomial time is good. 641 00:44:37 --> 00:44:41 And, the ellipsoid algorithm is the first one to do it. 642 00:44:41 --> 00:44:44 It's a very general algorithm, and very powerful, 643 00:44:44 --> 00:44:46 theoretically: completely impractical. 644 00:44:46 --> 00:44:49 But, it's cool. It lets you do things like you 645 00:44:49 --> 00:44:52 can solve a linear program that has exponentially many 646 00:44:52 --> 00:44:56 constraints in polynomial time. You've got all sorts of crazy 647 00:44:56 --> 00:44:57 things. So, I'll just say it's 648 00:44:57 --> 00:45:01 polynomial time. I can't say something nice 649 00:45:01 --> 00:45:04 about it; don't say it at all. It's impractical. 650 00:45:04 --> 00:45:07 Interior point methods are sort of the mixture. 651 00:45:07 --> 00:45:11 They run in polynomial time. You can guarantee that. 652 00:45:11 --> 00:45:14 And, they are also pretty practical, and there's sort of 653 00:45:14 --> 00:45:18 this competition these days about whether simplex or 654 00:45:18 --> 00:45:21 interior point is better. And, I don't know what it is 655 00:45:21 --> 00:45:24 today but a few years ago they were neck and neck. 656 00:45:24 --> 00:45:27 And, random sampling is a brand new approach. 657 00:45:27 --> 00:45:31 This is just from a couple years ago by two MIT professors, 658 00:45:31 --> 00:45:35 Dimitris Bertsimas and Santosh Vempala, I guess the other is in 659 00:45:35 --> 00:45:39 applied math. So, just to show you, 660 00:45:39 --> 00:45:41 there's active work in this area. 661 00:45:41 --> 00:45:44 People are still finding new ways to solve linear programs. 662 00:45:44 --> 00:45:47 This is completely randomized, and very simple, 663 00:45:47 --> 00:45:50 and very general. It hasn't been implemented, 664 00:45:50 --> 00:45:52 so we don't know how practical it is yet. 665 00:45:52 --> 00:45:54 But, it has potential. OK: pretty neat. 666 00:45:54 --> 00:45:57 OK, we're going to look at a somewhat simpler version of 667 00:45:57 --> 00:46:02 linear programming. The first restriction we are 668 00:46:02 --> 00:46:05 going to make is actually not much of a restriction. 669 00:46:05 --> 00:46:09 But, nonetheless we will consider it, it's a little bit 670 00:46:09 --> 00:46:13 easier to think about. So here, we had some polytope 671 00:46:13 --> 00:46:16 we wanted to maximize some objective. 672 00:46:16 --> 00:46:19 In a feasibility problem, I just want to know, 673 00:46:19 --> 00:46:23 is the polytope empty? Can you find any point in that 674 00:46:23 --> 00:46:26 polytope? Can you find any set of values, 675 00:46:26 --> 00:46:30 x, that satisfy these constraints? 676 00:46:30 --> 00:46:34 OK, so there's no objective. c, just find x such that AX is 677 00:46:34 --> 00:46:39 less than or equal to B. OK, it turns out you can prove 678 00:46:39 --> 00:46:43 a very general theorem that if you can solve linear 679 00:46:43 --> 00:46:47 feasibility, you can also solve linear programming. 680 00:46:47 --> 00:46:52 We won't prove that here, but this is actually no easier 681 00:46:52 --> 00:46:56 than the original problem even though it feels easier, 682 00:46:56 --> 00:47:03 and it's easier to think about. I was just saying actually no 683 00:47:03 --> 00:47:08 easier than LP. OK, the next restriction we're 684 00:47:08 --> 00:47:11 going to make is a real restriction. 685 00:47:11 --> 00:47:17 And it simplifies the problem quite a bit. 686 00:47:17 --> 00:47:30 687 00:47:30 --> 00:47:35 And that's to look at different constraints. 688 00:47:35 --> 00:47:40 And, if all this seemed a bit abstract so far, 689 00:47:40 --> 00:47:45 we will now ground ourselves little bit. 690 00:47:45 --> 00:47:51 A system of different constraints is a linear 691 00:47:51 --> 00:47:57 feasibility problem. So, it's an LP where there's no 692 00:47:57 --> 00:48:06 objective. And, it's with a restriction, 693 00:48:06 --> 00:48:17 so, where each row of the matrix, so, the matrix, 694 00:48:17 --> 00:48:26 A, has one one, and it has one minus one, 695 00:48:26 --> 00:48:36 and everything else in the row is zero. 696 00:48:36 --> 00:48:40 OK, in other words, each constraint has its very 697 00:48:40 --> 00:48:45 simple form. It involves two variables and 698 00:48:45 --> 00:48:49 some number. So, we have something like x_j 699 00:48:49 --> 00:48:53 minus x_i is less than or equal to w_ij. 700 00:48:53 --> 00:49:00 So, this is just a number. These are two variables. 701 00:49:00 --> 00:49:02 There's a minus sign, no values up here, 702 00:49:02 --> 00:49:06 no coefficients, no other of the X_k's appear, 703 00:49:06 --> 00:49:09 just two of them. And, you have a bunch of 704 00:49:09 --> 00:49:13 constraints of this form, one per row of the matrix. 705 00:49:13 --> 00:49:16 Geometrically, I haven't thought about what 706 00:49:16 --> 00:49:18 this means. I think it means the 707 00:49:18 --> 00:49:22 hyperplanes are pretty simple. Sorry I can't do better than 708 00:49:22 --> 00:49:25 that. It's a little hard to see this 709 00:49:25 --> 00:49:30 in high dimensions. But, it will start to 710 00:49:30 --> 00:49:38 correspond to something we've seen, namely the board that its 711 00:49:38 --> 00:49:45 next to, very shortly. OK, so let's do a very quick 712 00:49:45 --> 00:49:50 example mainly to have something to point at. 713 00:49:50 --> 00:49:59 Here's a very simple system of difference constraints -- 714 00:49:59 --> 00:50:11 715 00:50:11 --> 00:50:13 -- OK, and a solution. Why not? 716 00:50:13 --> 00:50:18 It's not totally trivial to solve this, but here's a 717 00:50:18 --> 00:50:21 solution. And the only thing to check is 718 00:50:21 --> 00:50:25 that each of these constraints is satisfied. 719 00:50:25 --> 00:50:29 x_1 minus x_2 is three, which is less than or equal to 720 00:50:29 --> 00:50:35 three, and so on. There could be negative values. 721 00:50:35 --> 00:50:42 There could be positive values. It doesn't matter. 722 00:50:42 --> 00:50:49 I'd like to transform this system of difference constraints 723 00:50:49 --> 00:50:55 into a graph because we know a lot about graphs. 724 00:50:55 --> 00:51:03 So, we're going to call this the constraint graph. 725 00:51:03 --> 00:51:08 And, it's going to represent these constraints. 726 00:51:08 --> 00:51:13 How'd I do it? Well, I take every constraint, 727 00:51:13 --> 00:51:20 which in general looks like this, and I convert it into an 728 00:51:20 --> 00:51:24 edge. OK, so if I write it as x_j 729 00:51:24 --> 00:51:29 minus x_i is less than or equal to some w_ij, 730 00:51:29 --> 00:51:36 w seems suggestive of weights. That's exactly why I called it 731 00:51:36 --> 00:51:38 w. I'm going to make that an edge 732 00:51:38 --> 00:51:41 from v_i to v_j. So, the order flips a little 733 00:51:41 --> 00:51:44 bit. And, the weight of that edge is 734 00:51:44 --> 00:51:46 w_ij. So, just do that. 735 00:51:46 --> 00:51:49 Make n vertices. So, you have the number of 736 00:51:49 --> 00:51:53 vertices equals n. The number of edges equals the 737 00:51:53 --> 00:51:56 number of constraints, which is m, the height of the 738 00:51:56 --> 00:52:01 matrix, and just transform. So, for example, 739 00:52:01 --> 00:52:06 here we have three variables. So, we have three vertices, 740 00:52:06 --> 00:52:09 v_1, v_2, v_3. We have x_1 minus x_2. 741 00:52:09 --> 00:52:14 So, we have an edge from v_2 to v_1 of weight three. 742 00:52:14 --> 00:52:18 We have x_2 minus x_3. So, we have an edge from v_3 to 743 00:52:18 --> 00:52:23 v_2 of weight minus two. And, we have x_1 minus x_3. 744 00:52:23 --> 00:52:27 So, we have an edge from v_3 to v_1 of weight two. 745 00:52:27 --> 00:52:32 I hope I got the directions right. 746 00:52:32 --> 00:52:34 Yep. So, there it is, 747 00:52:34 --> 00:52:40 a graph: currently no obvious connection to shortest paths, 748 00:52:40 --> 00:52:42 right? But in fact, 749 00:52:42 --> 00:52:47 this constraint is closely related to shortest paths. 750 00:52:47 --> 00:52:52 So let me just rewrite it. You could say, 751 00:52:52 --> 00:52:59 well, an x_j is less than or equal to x_i plus w_ij. 752 00:52:59 --> 00:53:03 Or, you could think of it as d[j] less than or equal to d[i] 753 00:53:03 --> 00:53:07 plus w_ij. This is a conceptual balloon. 754 00:53:07 --> 00:53:10 Look awfully familiar? A lot like the triangle 755 00:53:10 --> 00:53:13 inequality, a lot like relaxation. 756 00:53:13 --> 00:53:17 So, there's a very close connection between these two 757 00:53:17 --> 00:53:21 problems as we will now prove. 758 00:53:21 --> 00:53:43 759 00:53:43 --> 00:53:45 So, we're going to have two theorems. 760 00:53:45 --> 00:53:49 And, they're going to look similar to the correctness of 761 00:53:49 --> 00:53:53 Bellman-Ford in that they talk about negative weight cycles. 762 00:53:53 --> 00:53:54 Here we go. It turns out, 763 00:53:54 --> 00:53:57 I mean, we have this constraint graph. 764 00:53:57 --> 00:54:02 It can have negative weights. It can have positive weights. 765 00:54:02 --> 00:54:05 It turns out what matters is if you have a negative weight 766 00:54:05 --> 00:54:07 cycle. So, the first thing to prove is 767 00:54:07 --> 00:54:11 that if you have a negative weight cycle that something bad 768 00:54:11 --> 00:54:13 happens. OK, what could happen bad? 769 00:54:13 --> 00:54:16 Well, we're just trying to satisfy this system of 770 00:54:16 --> 00:54:19 constraints. So, the bad thing is that there 771 00:54:19 --> 00:54:22 might not be any solution. These constraints may be 772 00:54:22 --> 00:54:24 infeasible. And that's the claim. 773 00:54:24 --> 00:54:29 The claim is that this is actually an if and only if. 774 00:54:29 --> 00:54:33 But first we'll proved the if. If you have a negative weight 775 00:54:33 --> 00:54:38 cycle, you're doomed. The difference constraints are 776 00:54:38 --> 00:54:41 unsatisfiable. That's a more intuitive way to 777 00:54:41 --> 00:54:43 say it. In the LP world, 778 00:54:43 --> 00:54:48 they call it infeasible. But unsatisfiable makes a lot 779 00:54:48 --> 00:54:51 more sense. There's no way to assign the 780 00:54:51 --> 00:54:56 x_i's in order to satisfy all the constraints simultaneously. 781 00:54:56 --> 00:55:01 So, let's just take a look. Consider a negative weight 782 00:55:01 --> 00:55:03 cycle. It starts at some vertex, 783 00:55:03 --> 00:55:07 goes through some vertices, and at some point comes back. 784 00:55:07 --> 00:55:11 I don't care whether it repeats vertices, just as long as this 785 00:55:11 --> 00:55:15 cycle, from v_1 to v_1 is a negative weight cycle strictly 786 00:55:15 --> 00:55:17 negative weight. 787 00:55:17 --> 00:55:26 788 00:55:26 --> 00:55:30 OK, and what I'm going to do is just write down all the 789 00:55:30 --> 00:55:34 constraints. Each of these edges corresponds 790 00:55:34 --> 00:55:37 to a constraint, which must be in the set of 791 00:55:37 --> 00:55:40 constraints because we had that graph. 792 00:55:40 --> 00:55:45 So, these are all edges. Let's look at what they give 793 00:55:45 --> 00:55:48 us. So, we have an edge from v_1 to 794 00:55:48 --> 00:55:50 v_2. That corresponds to x_2 minus 795 00:55:50 --> 00:55:53 x_1 is, at most, something, w_12. 796 00:55:53 --> 00:55:57 Then we have x_3 minus x_2. That's the weight w_23, 797 00:55:57 --> 00:56:04 and so on. And eventually we get up to 798 00:56:04 --> 00:56:08 something like x_k minus x_(k-1). 799 00:56:08 --> 00:56:15 That's this edge: w_(k-1),k , and lastly we have 800 00:56:15 --> 00:56:23 this edge, which wraps around. So, it's x_1 minus x_k, 801 00:56:23 --> 00:56:30 w_k1 if I've got the signs right. 802 00:56:30 --> 00:56:35 Good, so here's a bunch of constraints. 803 00:56:35 --> 00:56:40 What do you suggest I do with them? 804 00:56:40 --> 00:56:47 Anything interesting about these constraints, 805 00:56:47 --> 00:56:52 say, the left hand sides? Sorry? 806 00:56:52 --> 00:57:00 It sounded like the right word. What was it? 807 00:57:00 --> 00:57:01 Telescopes, yes, good. 808 00:57:01 --> 00:57:04 Everything cancels. If I added these up, 809 00:57:04 --> 00:57:08 there's an x_2 and a minus x_2. There's a minus x_1 and an x_1. 810 00:57:08 --> 00:57:12 There's a minus XK and an XK. Everything here cancels if I 811 00:57:12 --> 00:57:15 add up the left hand sides. So, what happens if I add up 812 00:57:15 --> 00:57:18 the right hand sides? Over here I get zero, 813 00:57:18 --> 00:57:20 my favorite answer. And over here, 814 00:57:20 --> 00:57:24 we get all the weights of all the edges in the negative weight 815 00:57:24 --> 00:57:30 cycle, which is the weight of the cycle, which is negative. 816 00:57:30 --> 00:57:33 So, zero is strictly less than zero: contradiction. 817 00:57:33 --> 00:57:35 Contradiction: wait a minute, 818 00:57:35 --> 00:57:37 we didn't assume anything that was false. 819 00:57:37 --> 00:57:40 So, it's not really a contradiction in the 820 00:57:40 --> 00:57:43 mathematical sense. We didn't contradict the world. 821 00:57:43 --> 00:57:47 We just said that these constraints are contradictory. 822 00:57:47 --> 00:57:50 In other words, if you pick any values of the 823 00:57:50 --> 00:57:53 x_i's, there is no way that these can all be true because 824 00:57:53 --> 00:57:55 that you would get a contradiction. 825 00:57:55 --> 00:57:59 So, it's impossible for these things to be satisfied by some 826 00:57:59 --> 00:58:01 real x_i's. So, these must be 827 00:58:01 --> 00:58:07 unsatisfiable. Let's say there's no satisfying 828 00:58:07 --> 00:58:11 assignment, a little more precise, x_1 up to x_m, 829 00:58:11 --> 00:58:14 no weights. Can we satisfy those 830 00:58:14 --> 00:58:18 constraints? Because they add up to zero on 831 00:58:18 --> 00:58:23 the left-hand side, and negative on the right-hand 832 00:58:23 --> 00:58:26 side. OK, so that's an easy proof. 833 00:58:26 --> 00:58:33 The reverse direction will be only slightly harder. 834 00:58:33 --> 00:58:34 OK, so, cool. We have this connection. 835 00:58:34 --> 00:58:37 So motivation is, suppose you'd want to solve 836 00:58:37 --> 00:58:40 these difference constraints. And we'll see one such 837 00:58:40 --> 00:58:42 application. I Googled around for difference 838 00:58:42 --> 00:58:44 constraints. There is a fair number of 839 00:58:44 --> 00:58:46 papers that care about difference constraints. 840 00:58:46 --> 00:58:49 And, they all use shortest paths to solve them. 841 00:58:49 --> 00:58:51 So, if we can prove a connection between shortest 842 00:58:51 --> 00:58:54 paths, which we know how to compute, and difference 843 00:58:54 --> 00:58:56 constraints, then we'll have something cool. 844 00:58:56 --> 00:59:00 And, next class will see even more applications of difference 845 00:59:00 --> 00:59:05 constraints. It turns out they're really 846 00:59:05 --> 00:59:09 useful for all pairs shortest paths. 847 00:59:09 --> 00:59:16 OK, but for now let's just prove this equivalence and 848 00:59:16 --> 00:59:21 finish it off. So, the reverse direction is if 849 00:59:21 --> 00:59:29 there's no negative weight cycle in this constraint graph, 850 00:59:29 --> 00:59:35 then the system better be satisfiable. 851 00:59:35 --> 00:59:42 The claim is that these negative weight cycles are the 852 00:59:42 --> 00:59:49 only barriers for finding a solution to these difference 853 00:59:49 --> 00:59:54 constraints. I have this feeling somewhere 854 00:59:54 --> 00:59:58 here. I had to talk about the 855 00:59:58 --> 1:00:03 constraint graph. Good. 856 1:00:03 --> 1:00:13 857 1:00:13 --> 1:00:19.83 Satisfied, good. So, here we're going to see a 858 1:00:19.83 --> 1:00:28.482 technique that is very useful when thinking about shortest 859 1:00:28.482 --> 1:00:32.788 paths. And, it's a bit hard to guess, 860 1:00:32.788 --> 1:00:36.505 especially if you haven't seen it before. 861 1:00:36.505 --> 1:00:40.78 This is useful in problem sets, and in quizzes, 862 1:00:40.78 --> 1:00:45.334 and finals, and everything. So, keep this in mind. 863 1:00:45.334 --> 1:00:50.539 I mean, I'm using it to prove this rather simple theorem, 864 1:00:50.539 --> 1:00:56.115 but the idea of changing the graph, so I'm going to call this 865 1:00:56.115 --> 1:01:00.483 constraint graph G. Changing the graph is a very 866 1:01:00.483 --> 1:01:04.386 powerful idea. So, we're going to add a new 867 1:01:04.386 --> 1:01:07.732 vertex, s, or source, use the source, 868 1:01:07.732 --> 1:01:13.215 Luke, and we're going to add a bunch of edges from s because 869 1:01:13.215 --> 1:01:17.397 being a source, it better be connected to some 870 1:01:17.397 --> 1:01:23.529 things. So, we are going to add a zero 871 1:01:23.529 --> 1:01:29.764 weight edge, or weight zero edge from s to everywhere, 872 1:01:29.764 --> 1:01:36 so, to every other vertex in the constraint graph. 873 1:01:36 --> 1:01:40.121 Those vertices are called v_i, v_1 up to v_n. 874 1:01:40.121 --> 1:01:45.928 So, I have my constraint graph. But I'll copy this one so I can 875 1:01:45.928 --> 1:01:49.768 change it. It's always good to backup your 876 1:01:49.768 --> 1:01:53.046 work before you make changes, right? 877 1:01:53.046 --> 1:01:57.542 So now, I want to add a new vertex, s, over here, 878 1:01:57.542 --> 1:02:01.195 my new source. I just take my constraint 879 1:02:01.195 --> 1:02:06.909 graph, whatever it looks like, add in weight zero edges to all 880 1:02:06.909 --> 1:02:11.171 the other vertices. Simple enough. 881 1:02:11.171 --> 1:02:14.1 Now, what did I do? What did you do? 882 1:02:14.1 --> 1:02:18.953 Well, I have a candidate source now which can reach all the 883 1:02:18.953 --> 1:02:21.799 vertices. So, shortest path from s, 884 1:02:21.799 --> 1:02:24.728 hopefully, well, paths from s exist. 885 1:02:24.728 --> 1:02:30 I can get from s to everywhere in weight at most zero. 886 1:02:30 --> 1:02:31.851 OK, maybe less. Could it be less? 887 1:02:31.851 --> 1:02:34.338 Well, you know, like v_2, I can get to it by 888 1:02:34.338 --> 1:02:36.71 zero minus two. So, that's less than zero. 889 1:02:36.71 --> 1:02:38.677 So I've got to be a little careful. 890 1:02:38.677 --> 1:02:40.933 What if there's a negative weight cycle? 891 1:02:40.933 --> 1:02:42.785 Oh no? Then there wouldn't be any 892 1:02:42.785 --> 1:02:44.347 shortest paths. Fortunately, 893 1:02:44.347 --> 1:02:47.413 we assume that there's no negative weight cycle in the 894 1:02:47.413 --> 1:02:49.785 original graph. And if you think about it, 895 1:02:49.785 --> 1:02:53.082 if there's no negative weight cycle in the original graph, 896 1:02:53.082 --> 1:02:55.396 we add an edge from s to everywhere else. 897 1:02:55.396 --> 1:02:58.52 We're not making any new negative weight cycles because 898 1:02:58.52 --> 1:03:01.586 you can start at s and go somewhere at a cost of zero, 899 1:03:01.586 --> 1:03:05 which doesn't affect any weights. 900 1:03:05 --> 1:03:08.92 And then, you are forced to stay in the old graph. 901 1:03:08.92 --> 1:03:12.84 So, there can't be any new negative weight cycles. 902 1:03:12.84 --> 1:03:17 So, the modified graph has no negative weight cycles. 903 1:03:17 --> 1:03:20.519 That's good because it also has paths from s, 904 1:03:20.519 --> 1:03:25 and therefore it also has shortest paths from s. 905 1:03:25 --> 1:03:30.376 The modified graph has no negative weight because it 906 1:03:30.376 --> 1:03:34.487 didn't before. And, it has paths from s. 907 1:03:34.487 --> 1:03:38.387 There's a path from s to every vertex. 908 1:03:38.387 --> 1:03:44.923 There may not have been before. Before, I couldn't get from v_2 909 1:03:44.923 --> 1:03:49.561 to v_3, for example. Well, that's still true. 910 1:03:49.561 --> 1:03:53.145 But from s I can get to everywhere. 911 1:03:53.145 --> 1:03:58.521 So, that means that this graph, this modified graph, 912 1:03:58.521 --> 1:04:04.974 has shortest paths. Shortest paths exist from s. 913 1:04:04.974 --> 1:04:09.86 In other words, if I took all the shortest path 914 1:04:09.86 --> 1:04:14.641 weights, like I ran Bellman-Ford from s, then, 915 1:04:14.641 --> 1:04:19.421 I would get a bunch of finite numbers, d of v, 916 1:04:19.421 --> 1:04:22.926 for every value, for every vertex. 917 1:04:22.926 --> 1:04:27.175 That seems like a good idea. Let's do it. 918 1:04:27.175 --> 1:04:33.757 So, shortest paths exist. Let's just assign x_i to be the 919 1:04:33.757 --> 1:04:36.782 shortest path weight from s to v_i. 920 1:04:36.782 --> 1:04:39.806 Why not? That's a good choice for a 921 1:04:39.806 --> 1:04:43.898 number, the shortest path weight from s to v_i. 922 1:04:43.898 --> 1:04:47.99 This is finite because it's less than infinity, 923 1:04:47.99 --> 1:04:51.549 and it's greater than minus infinity, so, 924 1:04:51.549 --> 1:04:55.73 some finite number. That's what we need to do in 925 1:04:55.73 --> 1:05:00 order to satisfy these constraints. 926 1:05:00 --> 1:05:03.933 The claim is that this is a satisfying assignment. 927 1:05:03.933 --> 1:05:05.86 Why? Triangle inequality. 928 1:05:05.86 --> 1:05:09.311 Somewhere here we wrote triangle inequality. 929 1:05:09.311 --> 1:05:12.924 This looks a lot like the triangle inequality. 930 1:05:12.924 --> 1:05:16.456 In fact, I think that's the end of the proof. 931 1:05:16.456 --> 1:05:19.908 Let's see here. What we want to be true with 932 1:05:19.908 --> 1:05:24.564 this assignment is that x_j minus x_i is less than or equal 933 1:05:24.564 --> 1:05:28.497 to w_ij whenever ij is an edge. Or, let's say v_i, 934 1:05:28.497 --> 1:05:31.949 v_j, for every such constraint, so, for v_i, 935 1:05:31.949 --> 1:05:37.313 v_j in the edge set. OK, so what is this true? 936 1:05:37.313 --> 1:05:42.217 Well, let's just expand it out. So, x_i is this delta, 937 1:05:42.217 --> 1:05:46.935 and x_j is some other delta. So, we have delta of s, 938 1:05:46.935 --> 1:05:51.654 vj minus delta of s_vi. And, on the right-hand side, 939 1:05:51.654 --> 1:05:56.743 well, w_ij, that was the weight of the edge from I to J. 940 1:05:56.743 --> 1:06:01 So, this is the weight of v_i to v_j. 941 1:06:01 --> 1:06:03.659 OK, I will rewrite this slightly. 942 1:06:03.659 --> 1:06:07.315 Delta s, vj is less than or equal to delta s, 943 1:06:07.315 --> 1:06:09.06 vi plus w of v_i, v_j. 944 1:06:09.06 --> 1:06:12.965 And that's the triangle inequality more or less. 945 1:06:12.965 --> 1:06:18.117 The shortest path from s to v_j is, at most, shortest path from 946 1:06:18.117 --> 1:06:22.022 s to v_i plus a particular path from v_i to v_j, 947 1:06:22.022 --> 1:06:24.765 namely the single edge v_i to v_j. 948 1:06:24.765 --> 1:06:30 This could only be longer than the shortest path. 949 1:06:30 --> 1:06:33.372 And so, that makes the right-hand side bigger, 950 1:06:33.372 --> 1:06:37.644 which makes this inequality more true, meaning it was true 951 1:06:37.644 --> 1:06:39.967 before. And now it's still true. 952 1:06:39.967 --> 1:06:42.441 And, that proves it. This is true. 953 1:06:42.441 --> 1:06:45.513 And, these were all equivalent statements. 954 1:06:45.513 --> 1:06:48.961 This we know to be true by triangle inequality. 955 1:06:48.961 --> 1:06:52.408 Therefore, these constraints are all satisfied. 956 1:06:52.408 --> 1:06:54.357 Magic. I'm so excited here. 957 1:06:54.357 --> 1:06:59.004 So, we've proved that having a negative weight cycle is exactly 958 1:06:59.004 --> 1:07:05 when these system of difference constraints are unsatisfiable. 959 1:07:05 --> 1:07:08.241 So, if we want to satisfy them, if we want to find the right 960 1:07:08.241 --> 1:07:10 answer to x, we run Bellman-Ford. 961 1:07:10 --> 1:07:12.417 Either it says, oh, no negative weight cycle. 962 1:07:12.417 --> 1:07:14.945 Then you are hosed. Then, there is no solution. 963 1:07:14.945 --> 1:07:17.252 But that's the best you could hope to know. 964 1:07:17.252 --> 1:07:19.67 Otherwise, it says, oh, there was no negative 965 1:07:19.67 --> 1:07:22.087 weight cycle, and here are your shortest path 966 1:07:22.087 --> 1:07:23.736 weights. You just plug them in, 967 1:07:23.736 --> 1:07:26.868 and bam, you have your x_i's that satisfy the constraints. 968 1:07:26.868 --> 1:07:30 Awesome. Now, it wasn't just any graph. 969 1:07:30 --> 1:07:32.877 I mean, we started with constraints, algebra, 970 1:07:32.877 --> 1:07:35.886 we converted it into a graph by this transform. 971 1:07:35.886 --> 1:07:37.978 Then we added a source vertex, s. 972 1:07:37.978 --> 1:07:41.641 So, I mean, we had to build a graph to solve our problem, 973 1:07:41.641 --> 1:07:43.21 very powerful idea. Cool. 974 1:07:43.21 --> 1:07:47.135 This is the idea of reduction. You can reduce the problem you 975 1:07:47.135 --> 1:07:50.601 want to solve into some problem you know how to solve. 976 1:07:50.601 --> 1:07:54.656 You know how to solve shortest paths when there are no negative 977 1:07:54.656 --> 1:07:57.337 weight cycles, or find out that there is a 978 1:07:57.337 --> 1:08:01 negative weight cycle by Bellman-Ford. 979 1:08:01 --> 1:08:06.099 So, now we know how to solve difference constraints. 980 1:08:06.099 --> 1:08:09.4 It turns out you can do even more. 981 1:08:09.4 --> 1:08:15 Bellman-Ford does a little bit more than just solve these 982 1:08:15 --> 1:08:18.899 constraints. But first let me write down 983 1:08:18.899 --> 1:08:22.899 what I've been jumping up and down about. 984 1:08:22.899 --> 1:08:27 The corollary is you can use Bellman-Ford. 985 1:08:27 --> 1:08:34.484 I mean, you make this graph. Then you apply Bellman-Ford, 986 1:08:34.484 --> 1:08:41.33 and it will solve your system of difference constraints. 987 1:08:41.33 --> 1:08:45.686 So, let me put in some numbers here. 988 1:08:45.686 --> 1:08:49.793 You have m difference constraints. 989 1:08:49.793 --> 1:08:56.266 And, you have n variables. And, it will solve them in 990 1:08:56.266 --> 1:09:02.416 order m times n time. Actually, these numbers go up 991 1:09:02.416 --> 1:09:07.333 slightly because we are adding n edges, and we're adding one 992 1:09:07.333 --> 1:09:12 vertex, but assuming all of these numbers are nontrivial, 993 1:09:12 --> 1:09:14.916 m is at least n. It's order MN time. 994 1:09:14.916 --> 1:09:20.083 OK, trying to avoid cases where some of them are close to zero. 995 1:09:20.083 --> 1:09:22.25 Good. So, some other facts, 996 1:09:22.25 --> 1:09:26.25 that's what I just said. And we'll leave these as 997 1:09:26.25 --> 1:09:31 exercises because they're not too essential. 998 1:09:31 --> 1:09:35.627 The main thing we need is this. But, some other cool facts is 999 1:09:35.627 --> 1:09:39.484 that Bellman-Ford actually optimizes some objective 1000 1:09:39.484 --> 1:09:42.492 functions. So, we are saying it's just a 1001 1:09:42.492 --> 1:09:46.194 feasibility problem. We just want to know whether 1002 1:09:46.194 --> 1:09:48.739 these constraints are satisfiable. 1003 1:09:48.739 --> 1:09:52.75 In fact, you can add a particular objective function. 1004 1:09:52.75 --> 1:09:56.837 So, you can't give it an arbitrary objective function, 1005 1:09:56.837 --> 1:10:04.647 but here's one of interest. x_1 plus x_2 plus x_n, 1006 1:10:04.647 --> 1:10:15 OK, but not just that. We have some constraints. 1007 1:10:15 --> 1:10:24 1008 1:10:24 --> 1:10:27.395 OK, this is a linear program. I want to maximize the sum of 1009 1:10:27.395 --> 1:10:30.849 the x_i's subject to all the x_i's being nonpositive and the 1010 1:10:30.849 --> 1:10:33.542 difference constraints. So, this we had before. 1011 1:10:33.542 --> 1:10:35.943 This is fine. We noticed at some point you 1012 1:10:35.943 --> 1:10:38.811 could get from s to everywhere with cost, at most, 1013 1:10:38.811 --> 1:10:40.509 zero. So, we know that in this 1014 1:10:40.509 --> 1:10:42.851 assignment all of the x_i's are negative. 1015 1:10:42.851 --> 1:10:45.602 That's not necessary, but it's true when you run 1016 1:10:45.602 --> 1:10:47.944 Bellman-Ford. So if you solve your system 1017 1:10:47.944 --> 1:10:50.754 using Bellman-Ford, which is no less general than 1018 1:10:50.754 --> 1:10:53.272 anything else, you happen to get nonpositive 1019 1:10:53.272 --> 1:10:54.969 x_i's. And so, subject to that 1020 1:10:54.969 --> 1:10:58.072 constraint, it actually makes them is close to zero as 1021 1:10:58.072 --> 1:11:04.009 possible in the L1 norm. In the sum of these values, 1022 1:11:04.009 --> 1:11:08.578 it tries to make the sum as close to zero, 1023 1:11:08.578 --> 1:11:15.154 it tries to make the values as small as possible in absolute 1024 1:11:15.154 --> 1:11:20.393 value in this sense. OK, it does more than that. 1025 1:11:20.393 --> 1:11:25.297 It cooks, it cleans, it finds shortest paths. 1026 1:11:25.297 --> 1:11:31.761 It also minimizes the spread, the maximum over all i of x_i 1027 1:11:31.761 --> 1:11:37 minus the minimum over all i of x_i. 1028 1:11:37 --> 1:11:40.84 So, I mean, if you have your real line, and here are the 1029 1:11:40.84 --> 1:11:44.402 x_i's wherever they are. It minimizes this distance. 1030 1:11:44.402 --> 1:11:46.567 And zero is somewhere over here. 1031 1:11:46.567 --> 1:11:50.268 So, it tries to make the x_i's as compact as possible. 1032 1:11:50.268 --> 1:11:54.458 This is actually the L infinity norm, if you know stuff about 1033 1:11:54.458 --> 1:11:56.972 norms from your linear algebra class. 1034 1:11:56.972 --> 1:12:00.673 OK, this is the L1 norm. I think it minimizes every LP 1035 1:12:00.673 --> 1:12:05.17 norm. Good, so let's use this for 1036 1:12:05.17 --> 1:12:09.163 something. Yeah, let's solve a real 1037 1:12:09.163 --> 1:12:13.978 problem, and then we'll be done for today. 1038 1:12:13.978 --> 1:12:20.79 Next class we'll see the really cool stuff, the really cool 1039 1:12:20.79 --> 1:12:27.366 application of all of this. For now, and we'll see a cool 1040 1:12:27.366 --> 1:12:32.886 but relatively simple application, which is VLSI 1041 1:12:32.886 --> 1:12:37.528 layout. We talked a little bit about 1042 1:12:37.528 --> 1:12:40.779 VLSI way back and divide and conquer. 1043 1:12:40.779 --> 1:12:45.655 You have a bunch of chips, or you want to arrange them, 1044 1:12:45.655 --> 1:12:50.441 and minimize some objectives. So, here's a particular, 1045 1:12:50.441 --> 1:12:54.505 tons of problems that come out of VLSI layout. 1046 1:12:54.505 --> 1:12:59.02 Here's one of them. You have a bunch of features of 1047 1:12:59.02 --> 1:13:04.583 an integrated circuit. You want to somehow arrange 1048 1:13:04.583 --> 1:13:09.845 them on your circuit without putting any two of them too 1049 1:13:09.845 --> 1:13:13.768 close to each other. You have some minimum 1050 1:13:13.768 --> 1:13:19.03 separation like at least they should not get top of each 1051 1:13:19.03 --> 1:13:22.283 other. Probably, you also need some 1052 1:13:22.283 --> 1:13:26.589 separation to put wires in between, and so on, 1053 1:13:26.589 --> 1:13:33 so, without putting any two features too close together. 1054 1:13:33 --> 1:13:37.152 OK, so just to give you an idea, so I have some objects and 1055 1:13:37.152 --> 1:13:41.089 I'm going to be a little bit vague about how this works. 1056 1:13:41.089 --> 1:13:43.738 You have some features. This is stuff, 1057 1:13:43.738 --> 1:13:47.46 some chips, whatever. We don't really care what their 1058 1:13:47.46 --> 1:13:50.825 shapes look like. I just want to be able to move 1059 1:13:50.825 --> 1:13:55.192 them around so that the gap at any point, so let me just think 1060 1:13:55.192 --> 1:13:58.199 about this gap. This gap should be at least 1061 1:13:58.199 --> 1:14:01.134 some delta. Or, I don't want to use delta. 1062 1:14:01.134 --> 1:14:05 Let's say epsilon, good, small number. 1063 1:14:05 --> 1:14:08.828 So, I just need some separation between all of my parts. 1064 1:14:08.828 --> 1:14:12.378 And for this problem, I'm going to be pretty simple, 1065 1:14:12.378 --> 1:14:15.719 just say that the parts are only allowed to slide 1066 1:14:15.719 --> 1:14:18.433 horizontally. So, it's a one-dimensional 1067 1:14:18.433 --> 1:14:20.73 problem. These objects are in 2-d, 1068 1:14:20.73 --> 1:14:23.654 or whatever, but I can only slide them an x 1069 1:14:23.654 --> 1:14:25.672 coordinate. So, to model that, 1070 1:14:25.672 --> 1:14:29.57 I'm going to look at the left edge of every part and say, 1071 1:14:29.57 --> 1:14:32.981 well, these two left edges should be at least some 1072 1:14:32.981 --> 1:14:36.848 separation. So, I think of it as whatever 1073 1:14:36.848 --> 1:14:38.952 the distance is plus some epsilon. 1074 1:14:38.952 --> 1:14:41.501 But, you know, if you have some funky 2-d 1075 1:14:41.501 --> 1:14:45.135 shapes you have to compute, well, this is a little bit too 1076 1:14:45.135 --> 1:14:47.621 close because these come into alignment. 1077 1:14:47.621 --> 1:14:51.063 But, there's some constraint, well, for any two pieces, 1078 1:14:51.063 --> 1:14:53.677 I could figure out how close they can get. 1079 1:14:53.677 --> 1:14:57.31 They should get no closer. So, I'm going to call this x_1. 1080 1:14:57.31 --> 1:15:00.243 I'll call this x_2. So, we have some constraint 1081 1:15:00.243 --> 1:15:03.111 like x_2 minus x_1 is at least d plus epsilon, 1082 1:15:03.111 --> 1:15:07 or whatever you compute that weight to be. 1083 1:15:07 --> 1:15:09.735 OK, so for every pair of pieces, I can do this, 1084 1:15:09.735 --> 1:15:13.066 compute some constraint on how far apart they have to be. 1085 1:15:13.066 --> 1:15:15.861 And, now I'd like to assign these x coordinates. 1086 1:15:15.861 --> 1:15:18.596 Right now, I'm assuming they're just variables. 1087 1:15:18.596 --> 1:15:22.105 I want to slide these pieces around horizontally in order to 1088 1:15:22.105 --> 1:15:25.257 compactify them as much as possible so they fit in the 1089 1:15:25.257 --> 1:15:28.35 smallest chip that I can make because it costs money, 1090 1:15:28.35 --> 1:15:31.145 and time, and everything, and power, everything. 1091 1:15:31.145 --> 1:15:34 You always want your chip small. 1092 1:15:34 --> 1:15:40.225 So, Bellman-Ford does that. All right, so Bellman-Ford 1093 1:15:40.225 --> 1:15:47.626 solves these constraints because it's just a bunch of difference 1094 1:15:47.626 --> 1:15:51.972 constraints. And we know that they are 1095 1:15:51.972 --> 1:15:57.963 solvable because you could spread all the pieces out 1096 1:15:57.963 --> 1:16:03.25 arbitrarily far. And, it minimizes the spread, 1097 1:16:03.25 --> 1:16:10.298 minimizes the size of the chip I need, a max of x_i minus the 1098 1:16:10.298 --> 1:16:14.879 min of x_i. So, this is it maximizes 1099 1:16:14.879 --> 1:16:18.167 compactness, or minimizes size of the chip. 1100 1:16:18.167 --> 1:16:22.943 OK, this is a one-dimensional problem, so it may seem a little 1101 1:16:22.943 --> 1:16:27.014 artificial, but the two dimensional problem is really 1102 1:16:27.014 --> 1:16:29.049 hard to solve. And this is, 1103 1:16:29.049 --> 1:16:33.355 in fact, the best you can do with a nice polynomial time 1104 1:16:33.355 --> 1:16:37.419 algorithm. There are other applications if 1105 1:16:37.419 --> 1:16:42.024 you're scheduling events in, like, a multimedia environment, 1106 1:16:42.024 --> 1:16:46.629 and you want to guarantee that this audio plays at least two 1107 1:16:46.629 --> 1:16:50.922 seconds after this video, but then there are things that 1108 1:16:50.922 --> 1:16:55.605 are playing at the same time, and they have to be within some 1109 1:16:55.605 --> 1:16:59.351 gap of each other, so, lots of papers about using 1110 1:16:59.351 --> 1:17:02.786 Bellman-Ford, solve difference constraints to 1111 1:17:02.786 --> 1:17:06.766 enable multimedia environments. OK, so there you go. 1112 1:17:06.766 --> 1:17:11.449 And next class we'll see more applications of Bellman-Ford to 1113 1:17:11.449 --> 1:17:14.181 all pairs shortest paths. Questions? 1114 1:17:14.181 --> 1:17:17 Great.